Compression Picks The Significant Item Sets
نویسندگان
چکیده
Finding a comprehensive set of patterns that truly captures the characteristics of a database is a complicated matter. Frequent item set mining attempts this, but low support levels often result in exorbitant amounts of item sets. Recently we showed that by using MDL we are able to select a small number of item sets that compress the data well [15]. Here we show that this small set is a good approximation of the underlying data distribution. Using the small set in a MDL-based classifier leads to performance on par with wellknown rule-induction and association-rule based methods. Advantages are that no parameters need to be set manually and only very few item sets are used. The classification scores indicate that selecting item sets through compression is an elegant way of mining interesting patterns that can subsequently find use in many applications.1
منابع مشابه
Data Deduplication in Parallel Mining of Frequent Item sets using MapReduce
A Parallel Frequent Item sets mining algorithm called FiDoop using MapReduce programming model. FiDoop includes the frequent items ultrametric tree(FIU-tree), in that three MapReduce jobs are applied to complete the mining task. The scalability problem has been addressed bythe implementation of a handful of FP-growth-like parallelFIM algorithms. InFiDoop, the mappers independently and concurren...
متن کاملCompression Cluster Based Efficient k-Medoid Algorithm to Increase Scalability
The experiments are pursued on both synthetic in data sets are real. The synthetic data sets which we used for our experiments were generated using the procedure. We refer to readers to it for more details to the generation of large data sets. We report experimental results on two synthetic more data sets in this data set; the average transaction of size and its average maximal potentially freq...
متن کاملTree Based Space Partition of Trajectory Pattern Mining For Frequent Item Sets
Transaction Data base (TD) is an extension of frequent item set mining in large static of data mining field. The dynamic and continuous evolving nature of data base requires up hMinor algorithm, hCount and lossy coun explosion of patterns. Fixed window length and decay factor are required to implement the explosion model. The scanning and the support evaluation for item set are fast. Hence, the...
متن کاملAutomatic S-Wave Picker for Local Earthquake Tomography
High-resolution seismic tomography at local and regional scales requires large and consistent sets of arrival-time data. Algorithms combining accurate picking with an automated quality classification can be used for repicking waveforms and compiling large arrival-time data sets suitable for tomographic inversion. S-wave velocities represent a key parameter for petrological interpretation, impro...
متن کاملAn Efficient Frequent Pattern Mining Algorithm to Find the Existence of K-Selective Interesting Patterns in Large Dataset Using SIFPMM
Association rule mining in huge database is one of most popular data exploration technique for business decision makers. Discovering frequent item set is the fundamental process in association rule mining. Several algorithms were introduced in the literature to find frequent patterns. Those algorithms discover all combinations of frequent item sets for a given minimum support threshold. But som...
متن کامل